Thyroid cancer can develop from the different cells that form the follicles of the thyroid. This gland located at the base of the throat secretes hormones such as T3 and T4 that have their metabolic functionalities such as control of heart rate, blood pressure, body temperature and weight. See it on TCGA.
From the four types of thyroid cancer (papillary, follicular, medullary, and anaplastic thyroid cancer) Papillary Thyroid Carcinoma (PTC) is the most common type of thyroid cancer (Agrawal et al., 2014). It is more prevalent in women than men and its common diagnosis occurs around 49 years old.
Driver onco-mutations for this type of cancer appear as alterations of the MAPK signaling pathway and the PI3K-AKT pathway (Agrawal et al., 2014). Both are implied in cell proliferation and survival and in human tumorigenesis. The overactivation of the MAPK pathway because of mutations such as the BRAFV600E mutation, leads to the development of papillary thyroid cancer (PTC) from follicular thyroid cells (Xing, 2013). On the other hand, mutations that activate the PI3K-AKT pathway, such as mutations in RAS, PTEN and PIK3CA, mostly leads to development of follicular thyroid adenoma (FTA) and follicular thyroid cancer (FTC) also in follicular thyroid cells (Xing, 2013).
Research from The Cancer Genome Atlas (TCGA) focused on PTC and they found that a part from alterations in BRAF (specifically BRAFV600E) and RAS, there are other driver mutations such the ones in EIF1AX PPM1D, and CHEK2 genes that are main alterations for that type of thyroid cancer(Agrawal et al., 2014).
We start importing the raw table of counts.
library(SummarizedExperiment)
thca <- readRDS("seTHCA.rds")
thca
class: RangedSummarizedExperiment
dim: 20115 572
metadata(5): experimentData annotation cancerTypeCode
cancerTypeDescription objectCreationDate
assays(1): counts
rownames(20115): 1 2 ... 102724473 103091865
rowData names(3): symbol txlen txgc
colnames(572): TCGA.4C.A93U.01A.11R.A39I.07
TCGA.BJ.A0YZ.01A.11R.A10U.07 ... TCGA.KS.A41J.11A.12R.A23N.07
TCGA.KS.A41L.11A.11R.A23N.07
colData names(549): type bcr_patient_uuid ...
lymph_nodes_aortic_pos_by_ihc lymph_nodes_aortic_pos_total
Explore the column (phenotypic) data, which in this case corresponds to clinical variables, and their corresponding metadata.
dim(colData(thca))
[1] 572 549
col.data <- colData(thca)
col.data[1:5, 1:5]
DataFrame with 5 rows and 5 columns
type bcr_patient_uuid
<factor> <factor>
TCGA.4C.A93U.01A.11R.A39I.07 tumor E1C8AF06-9EA4-4062-9264-AA916E0025D1
TCGA.BJ.A0YZ.01A.11R.A10U.07 tumor f3ee888e-5116-4313-a2f6-3b1dcc3e4bc1
TCGA.BJ.A0Z0.01A.11R.A10U.07 tumor 72d8dcc3-0709-4fd1-98d4-fb75e9340758
TCGA.BJ.A0Z2.01A.11R.A10U.07 tumor f9ceffc0-d544-418d-b4a9-bd3c84e37026
TCGA.BJ.A0Z3.01A.11R.A13Y.07 tumor 331cae6e-2868-4c58-9302-709a9ff7d025
bcr_patient_barcode form_completion_date
<factor> <factor>
TCGA.4C.A93U.01A.11R.A39I.07 TCGA-4C-A93U 2014-11-12
TCGA.BJ.A0YZ.01A.11R.A10U.07 TCGA-BJ-A0YZ 2011-4-11
TCGA.BJ.A0Z0.01A.11R.A10U.07 TCGA-BJ-A0Z0 2011-4-15
TCGA.BJ.A0Z2.01A.11R.A10U.07 TCGA-BJ-A0Z2 2011-4-15
TCGA.BJ.A0Z3.01A.11R.A13Y.07 TCGA-BJ-A0Z3 2011-6-2
prospective_collection
<factor>
TCGA.4C.A93U.01A.11R.A39I.07 YES
TCGA.BJ.A0YZ.01A.11R.A10U.07 NO
TCGA.BJ.A0Z0.01A.11R.A10U.07 NO
TCGA.BJ.A0Z2.01A.11R.A10U.07 NO
TCGA.BJ.A0Z3.01A.11R.A13Y.07 NO
col.data.meta <- mcols(colData(thca), use.names = TRUE)
col.data.meta
DataFrame with 549 rows and 2 columns
labelDescription
<character>
type sample type (tumor/normal)
bcr_patient_uuid bcr patient uuid
bcr_patient_barcode bcr patient barcode
form_completion_date form completion date
prospective_collection tissue prospective collection indicator
... ...
lymph_nodes_pelvic_pos_total total pelv lnp
lymph_nodes_aortic_examined_count total aor lnr
lymph_nodes_aortic_pos_by_he aln pos light micro
lymph_nodes_aortic_pos_by_ihc aln pos ihc
lymph_nodes_aortic_pos_total total aor-lnp
CDEID
<character>
type NA
bcr_patient_uuid NA
bcr_patient_barcode 2673794
form_completion_date NA
prospective_collection 3088492
... ...
lymph_nodes_pelvic_pos_total 3151828
lymph_nodes_aortic_examined_count 3104460
lymph_nodes_aortic_pos_by_he 3151832
lymph_nodes_aortic_pos_by_ihc 3151831
lymph_nodes_aortic_pos_total 3151827
Now, explore the row (feature) data.
rowRanges(thca)
GRanges object with 20115 ranges and 3 metadata columns:
seqnames ranges strand | symbol txlen
<Rle> <IRanges> <Rle> | <character> <integer>
1 chr19 [58345178, 58362751] - | A1BG 3322
2 chr12 [ 9067664, 9116229] - | A2M 4844
9 chr8 [18170477, 18223689] + | NAT1 2280
10 chr8 [18391245, 18401218] + | NAT2 1322
12 chr14 [94592058, 94624646] + | SERPINA3 3067
... ... ... ... . ... ...
100996331 chr15 [20835372, 21877298] - | POTEB 1706
101340251 chr17 [40027542, 40027645] - | SNORD124 104
101340252 chr9 [33934296, 33934376] - | SNORD121B 81
102724473 chrX [49303669, 49319844] + | GAGE10 538
103091865 chr21 [39313935, 39314962] + | BRWD1-IT2 1028
txgc
<numeric>
1 0.5644190
2 0.4882329
9 0.3942982
10 0.3895613
12 0.5249429
... ...
100996331 0.4308324
101340251 0.4903846
101340252 0.4074074
102724473 0.5055762
103091865 0.5924125
-------
seqinfo: 455 sequences (1 circular) from hg38 genome
Looking deeper into col.data with:
col.data[1:5, 30:35]
DataFrame with 5 rows and 6 columns
lymph_nodes_examined_count
<factor>
TCGA.4C.A93U.01A.11R.A39I.07 YES
TCGA.BJ.A0YZ.01A.11R.A10U.07 [Not Available]
TCGA.BJ.A0Z0.01A.11R.A10U.07 YES
TCGA.BJ.A0Z2.01A.11R.A10U.07 NO
TCGA.BJ.A0Z3.01A.11R.A13Y.07 YES
lymph_nodes_examined
<factor>
TCGA.4C.A93U.01A.11R.A39I.07 41
TCGA.BJ.A0YZ.01A.11R.A10U.07 [Not Available]
TCGA.BJ.A0Z0.01A.11R.A10U.07 5
TCGA.BJ.A0Z2.01A.11R.A10U.07 [Not Available]
TCGA.BJ.A0Z3.01A.11R.A13Y.07 3
lymph_nodes_examined_he_count
<factor>
TCGA.4C.A93U.01A.11R.A39I.07 1
TCGA.BJ.A0YZ.01A.11R.A10U.07 [Not Available]
TCGA.BJ.A0Z0.01A.11R.A10U.07 0
TCGA.BJ.A0Z2.01A.11R.A10U.07 [Not Available]
TCGA.BJ.A0Z3.01A.11R.A13Y.07 0
weiss_score_overall mitoses_per_50_hpf
<factor> <factor>
TCGA.4C.A93U.01A.11R.A39I.07 NA NA
TCGA.BJ.A0YZ.01A.11R.A10U.07 NA NA
TCGA.BJ.A0Z0.01A.11R.A10U.07 NA NA
TCGA.BJ.A0Z2.01A.11R.A10U.07 NA NA
TCGA.BJ.A0Z3.01A.11R.A13Y.07 NA NA
ajcc_pathologic_tumor_stage
<factor>
TCGA.4C.A93U.01A.11R.A39I.07 Stage IVA
TCGA.BJ.A0YZ.01A.11R.A10U.07 Stage II
TCGA.BJ.A0Z0.01A.11R.A10U.07 Stage II
TCGA.BJ.A0Z2.01A.11R.A10U.07 Stage IVC
TCGA.BJ.A0Z3.01A.11R.A13Y.07 Stage I
We observed that there are different ways to indicate that the data is not available (NA, [Not Available]…). Consequently, we decide to study which variables have more information. To do so, we create two functions, one (na.replace) to change those values into the standarized NA nomenclature, and filter.info to quantify how many of each variable is not NA.
na.replace <- function(x) {
# Remove the unwanted factors
lev <- levels(x)
lev <- lev[!(lev %in% "[Not Available]")]
lev <- lev[!(lev %in% "[Unknown]")]
lev <- lev[!(lev %in% "[Not Applicable]")]
lev <- lev[!(lev %in% "[Not Available]|[Not Available]|[Not Available]")]
lev <- lev[!(lev %in% "[Not Evaluated]")]
lev <- lev[!(lev %in% "None")]
x <- factor(x, levels = lev)
}
filter.info <- function(x) {
# Function to filter the NA, [Not Available], [Not Applicable] and [Unknown]
# Return the proportion of information on x
nas <- sum(is.na(x))
return(1 - (nas/length(x)))
}
We apply these functions to the col.data and we visualize it:
meta.info <- as.data.frame(lapply(col.data, na.replace))
variable.info <- sapply(meta.info, filter.info)
# Representing the information by column vs total information
relative.info <- variable.info[order(variable.info)]/sum(variable.info)
plot(relative.info, main = "Information brought by column")
# Plotting the histogram of the information
hist(variable.info, main = "Variables completness", xlab = "% of not NA of each variable",
ylab = "Counts")
# Seeing the most informative to set a threshold, frequency are the number
# of columns with such % of information
hist(variable.info, xlim = c(0.3, 1), ylim = c(0, 35), main = "Variables completness",
xlab = "% of not NA of each variable", ylab = "Counts")
As we can see, in many variables the main value is just NA or other derivatives meaning “No information available for this variable”. Therefore, in order to keep the most informative variables, we apply a threshold of 60% (of the values containing informative data) to select the relevant ones:
filtered <- meta.info[, variable.info > 0.6]
dim(filtered)
[1] 572 44
meta.info.summary2 <- col.data.meta[variable.info > 0.6, ]
To explore the data we first create an object with ggplot:
library("ggplot2")
fplot <- ggplot(as.data.frame(filtered))
## ggplot options for axes and background color Options of the labels
l <- theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust = 1))
# Removing the background
b <- theme(axis.line = element_line(colour = "black"), panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(colour = "grey", size = 0.5, linetype = 2),
panel.grid.minor.y = element_line(colour = "grey", size = 0.25, linetype = 3),
panel.border = element_blank(), panel.background = element_blank())
y.axis <- ylab("number of cases")
And we plot some relationships for some significant variables: In the “normal” type samples most of the information is not available. This could be due to the fact that all “normal” samples were obtained from patients with tumors. So in order to have the maximum information of each sample, we infer “normal” samples information from its paired sample data (mainly gender and ethnicity).
And we plot some relationships for some significant variables: In the “normal” type samples most of the information is not available. This could be due to the fact that all “normal” samples were obtained from patients with tumors. So in order to have the maximum information of each sample, we infer “normal” samples information from its paired sample data (mainly gender and ethnicity).
fplot + geom_bar(aes(type, fill = gender)) + b + y.axis + ggtitle("Gender distribution across normal and tumour samples")
As expected, there are more female samples than male samples. As we mentioned before, thyroid cancer is more prevalent in women than men and we can see that not only for tumor samples but for normals samples too. In addition, there are more tumor cases than normal cases.
fplot + geom_bar(aes(type, fill = tumor_status)) + b + y.axis + ggtitle("Tumor status across normal and tumour samples")
We also have examined the tumor status of the tumor samples. There are more cancer patients without tumor (“tumor free”) than with the tumor (“with tumor”) after the collection of the data. We assume that most of the patients that participate in the study have recovered after it.
fplot + facet_wrap(~gender) + geom_bar(aes(type, fill = ethnicity)) + b + y.axis +
ggtitle("Ethnicity distribution across normal and tumour samples")
In this plot we can see clearly the lack of data for some variables regarding normal samples. In other words, there is no record for normal samples neither females nor males about the hispanic origin of the individuals.
labels <- c("AM. INDIAN OR ALASKA NAT.", "ASIAN", "BLACK OR AFRICAN AM.", "WHITE",
"NA")
fplot + facet_wrap(~ethnicity) + geom_bar(aes(race, fill = gender)) + b + l +
y.axis + scale_x_discrete(breaks = waiver(), labels = labels) + ggtitle("Gender distribution across different ethnicity and race.")
If we have a look at the ethnicities, gender and hispanic origin, the data shows more cases for “white” individuals “not hispanic or latino” than “white” individuals hispanic or latino.
fplot + geom_bar(aes(x = gender, fill = race)) + b + y.axis + ggtitle("Ethnicity distribution in male and female")
If we ignore the hispanic origin of the individuals and just focus on the main ethnicities of the study we can observe that for both female and male samples there is a majority of “white” individuals. However, the fact that “white” subjects are hispanic or not can provide some variability so that is something we will take into account later.
fplot + facet_wrap(~type) + geom_bar(aes(tumor_focality, fill = gender)) + b +
l + y.axis + xlab("tumor focality") + ggtitle("Gender distribution in different tumour focality")
There is no data about tumor focality in the normal samples…
labels <- c("Other specify", "Papillary - Classical", "Papillary - Follicular",
"Papillary - Tall Cell", "NA")
hd.xaxis <- scale_x_discrete(name = "histologic diagnosis", breaks = waiver(),
labels = labels)
fplot + geom_bar(aes(histologic_diagnosis, fill = gender)) + b + l + hd.xaxis +
y.axis + ggtitle("Gender distribution across different types of hystologic diagnosis")
Here, we can see that the tumor samples came mainly from classical Papillary Thyroid Carcinoma. We expect differential expression between the different thyroid cancer phenotypes so this might be a source of variability.
labels <- c("Other specify", "Papillary - Classical", "Papillary - Follicular",
"Papillary - Tall Cell", "NA")
ggplot(filtered[!is.na(filtered$age_at_diagnosis), ]) + geom_bar(aes(age_at_diagnosis,
na.rm = TRUE, fill = histologic_diagnosis)) + b + l + labs(x = "age at diagnosis") +
y.axis + scale_fill_discrete(name = "histologic diagnosis", breaks = waiver(),
labels = labels) + ggtitle("Histologic diagnosis distribution across different ages")
Even thought, thyroid cancer is commontly diagnosed at mature ages (~50) this plot shows that the classical papillary thyroid cancer has similar representation in all ages for this dataset.
As a summary from the previous plots, the most abundant population in the samples is the “white” and “not hispanic or latino”, so we decide to perform the analysis on this group of individuals, whose tumoral sample correspond to the most common cancer in thyroids: Papillary Thyroid Carcinoma(PTC).
With this approach we want to avoid to include in the analysis, samples that introduce variability (mainly differential expression due to enviromental or epigenetic factors) as much as possible. To have more insight on the disease we want to have just paired samples, that is, samples from the same participant. So we filter the data accordingly:
# Paired data
paired <- intersect(thca[, thca$type == "tumor"]$bcr_patient_barcode, thca[,
thca$type == "normal"]$bcr_patient_barcode)
paired_mask <- thca$bcr_patient_barcode %in% paired
filtered_filt <- filtered[paired_mask, ]
# Imposing the selected conditions
tumor_names <- row.names(subset(filtered_filt, type == "tumor" & race == "WHITE" &
histologic_diagnosis == "Thyroid Papillary Carcinoma - Classical/usual" &
ethnicity == "NOT HISPANIC OR LATINO"))
# Extracting the participants identifiers
participants <- unlist(lapply(tumor_names, substr, 9, 12))
participants <- participants[participants != ""]
check <- function(x, checklist) {
# Returning name of the sample just if the name is the same as in checklist
a <- substr(x, 9, 12)
if (a %in% checklist) {
return(x)
}
}
# Extracting sample names of tumor and normal data.
sample_names <- unlist(lapply(row.names(filtered_filt), check, checklist = participants))
thca.filtered <- thca[, sample_names]
We explore again the variables for this subset. In this case, we show just some of the relationship plots we have seen previously:
filtered_filt <- filtered_filt[row.names(filtered_filt) %in% sample_names, ]
mplot <- ggplot(filtered_filt) + y.axis
mplot + geom_bar(aes(type, fill = gender)) + b + ggtitle("Ethnicity and type of sample after filtering")
This plot just checks whether we have the same number of cases in normal sample and tumor sample so that we have done correctly the filtering.
mplot + facet_wrap(~gender) + geom_bar(aes(type, fill = ethnicity)) + b + ggtitle("Hispanic origin, gender and type of sample after filtering")
Hispanic origin data is only available for tumor samples but as we have mentioned before we can extrapolate ethinicity and origin to the normal samples because they are paired (see below).
mplot + geom_bar(aes(x = gender, fill = ethnicity)) + b + l + ggtitle("Gender samples and hispanic origin after filtering")
Same proportion of being or not being “hispanic or latino” in females and males. Subset filtering has not modified the fact that we have more female samples than males.
mplot + geom_bar(aes(histologic_diagnosis, fill = gender)) + b + l + scale_x_discrete(name = "histologic diagnosis",
breaks = waiver(), labels = c("Papillary - Classical", "NA")) + ggtitle("Gender distribution across different types of hystologic diagnosis
after filtering")
Papillary Classical Thyroid cancer in the same proportion as “NA” because we have the same number of normal cases and classical thyroid tumor cases.
ggplot(filtered_filt[!is.na(filtered_filt$age_at_diagnosis), ]) + geom_bar(aes(age_at_diagnosis,
fill = histologic_diagnosis)) + b + l + y.axis + xlab("age at diagnosis") +
scale_fill_discrete(name = "histologic diagnosis", breaks = waiver(), labels = c("Papillary - Classical",
"NA")) + ggtitle("Histologic diagnosis distribution across different ages after filtering")
Number of cases across ages is more or less similar after the filtering exept for some ages that have more cases around 30, 34, 42 and 47.
To perform quality assessment and normalization we need first to load the edgeR R/Bioconductor package and create a `DGEList’ object.
library(edgeR)
# Normalization just with the selected samples
dge.subset <- DGEList(counts = assays(thca.filtered)$counts, genes = as.data.frame(mcols(thca.filtered)))
dge <- DGEList(counts = assays(thca)$counts, genes = as.data.frame(mcols(thca)))
Now calculate \(\log_2\) CPM values of expression and put them as an additional assay element to ease their manipulation.
assays(thca.filtered)$logCPM <- cpm(dge.subset, log = TRUE, prior.count = 3)
assays(thca.filtered)$logCPM[1:5, 1:5]
TCGA.BJ.A28R.01A.11R.A16R.07 TCGA.BJ.A28W.01A.11R.A32Y.07
1 3.014910 3.895927
2 8.681899 9.333470
9 -4.164318 -4.164318
10 -4.164318 -4.164318
12 4.423408 4.912253
TCGA.BJ.A2N7.01A.11R.A18C.07 TCGA.BJ.A2N8.01A.11R.A18C.07
1 2.074429 3.634778
2 9.313603 8.873442
9 -4.164318 -4.164318
10 -4.164318 -4.164318
12 4.523368 4.366761
TCGA.BJ.A2N9.01A.11R.A18C.07
1 3.436989
2 8.467420
9 -4.164318
10 -4.164318
12 4.584505
Let’s examine the library sizes in terms of total number of sequence read counts per sample. Figure 5 below shows library sizes per sample in increasing order.
Plotting for all samples makes it impossible to see accurately the blue and red colors corresponding to tumor and normal respectively.
After the filtering the subset plot allows one to see that there is similar heterogeneity in tumor and normal library sizes. In addition, this figure reveals substantial differences in sequencing depth between samples and we may consider discarding those samples whose depth is substantially lower than the rest. To identify which are these samples we may simply look at the actual numbers including portion of the sample identifier that distinguishes them.
sampledepth <- round(dge.subset$sample$lib.size/1e+06, digits = 1)
names(sampledepth) <- substr(sample_names, 6, 12)
qqnorm(sampledepth)
qqline(sampledepth)
Looking into the qqplot we decide to filter out those below 40 but keeping only those that are paired:
sample_names <- sample_names[sampledepth > 40]
keep.paired <- function(pattern, x) {
# Return the names if it is paired
pos <- grep(pattern, x, fixed = TRUE)
if (length(pos) > 1) {
return(x[pos])
}
}
sample_names <- unique(unlist(lapply(substr(sample_names, 9, 12), keep.paired,
sample_names)))
thca.filtered <- thca.filtered[, colnames(thca.filtered) %in% sample_names]
dge.subset <- DGEList(counts = assays(thca.filtered)$counts, genes = mcols(thca.filtered))
Warning in as.data.frame(x, row.names = NULL, optional = optional, ...):
Arguments in '...' ignored
Let’s look at the distribution of expression values per sample in terms of logarithmic CPM units. Due to the large number of samples, we display tumor and normal samples separately, and are shown in Figure 7.
library(geneplotter)
par(mfrow = c(1, 2))
multidensity(as.list(as.data.frame(assays(thca.filtered[, thca.filtered$type ==
"tumor"])$logCPM)), xlab = "log 2 CPM", legend = NULL, main = "Tumor samples",
las = 1)
multidensity(as.list(as.data.frame(assays(thca.filtered[, thca.filtered$type ==
"normal"])$logCPM)), xlab = "log 2 CPM", legend = NULL, main = "Normal samples",
las = 1)
These two spikes are due to the lowly expressed genes and the highly expressed genes, but in the lowly expressed genes the effect of GC content and the read length may have higher impact.
To see the impact of GC and length on the number of read we plot them:
gc_len <- rowRanges(thca.filtered)
cor(rowMeans(assays(thca.filtered)$counts), gc_len$txgc)
[1] -0.01173666
cor(rowMeans(assays(thca.filtered)$counts), gc_len$txlen)
[1] -0.0008702022
par(mfrow = c(1, 2))
plot(log10(rowMeans(assays(thca.filtered)$counts)), gc_len$txgc, main = "GC content over expression",
xlab = "Expression (log10)", ylab = "GC")
plot(log10(rowMeans(assays(thca.filtered)$counts)), log10(gc_len$txlen), main = "Length over the expression",
xlab = "Expression (log10)", ylab = "Length (log10)")
There is a low correlation on the counts and the GC content or the length but as there are clearly two groups based on the length, so we will normalize also using the length with the package cqn:
library("cqn")
cqn.subset <- cqn(assays(thca.filtered)$counts, x = gc_len$txgc, lengths = gc_len$txlen)
par(mfrow = c(1, 2))
boxplot(cqn.subset$func1, names = FALSE, xlab = "Samples", main = "Estimated effect of GC")
boxplot(cqn.subset$func2, names = FALSE, xlab = "Samples", main = "Estimated effect of length")
As we thought the effect of the length is higher than the effect of GC content
We can see if the distribution of expression values per sample in terms of logarithmic CPM units. Due to the large number of samples, we display tumor and normal samples separately:
cqn.logCPM <- cqn.subset$y + cqn.subset$offset
assays(thca.filtered)$cqn.logCPM <- cqn.logCPM
par(mfrow = c(1, 2))
multidensity(as.list(as.data.frame(cqn.logCPM[, thca.filtered$type == "tumor"])),
xlab = "log 2 CPM", legend = NULL, main = "Tumor samples", las = 1)
multidensity(as.list(as.data.frame(cqn.logCPM[, thca.filtered$type == "normal"])),
xlab = "log 2 CPM", legend = NULL, main = "Normal samples", las = 1)
These “after-normalization” plots are quite better than the previous ones used without GC and length normalization on the second spike. With this normalization we do not appreciate substantial differences between the samples in the distribution of expression values. However, below 0 there is a lot of variability.
Let’s calculate now the average expression per gene through all the samples. Figure 9 shows the distribution of those values across genes.
par(mfrow = c(1, 2))
avgexp <- rowMeans(assays(thca.filtered)$logCPM)
hist(avgexp, xlab = "log2 CPM", main = "Using DGEList normalization", las = 1)
abline(v = 1, col = "red", lwd = 2)
# Using the corrected logCPM
avgexp.cqn <- rowMeans(assays(thca.filtered)$cqn.logCPM)
hist(avgexp.cqn, xlab = "log2 CPM", main = "Using cqn normalization", las = 1)
abline(v = 1, col = "red", lwd = 2)
In the light of previous plot, we may consider a cutoff of 1 log CPM unit as minimum value of expression to select genes being expressed across samples. Using this cutoff we proceed to filter out lowly-expressed genes.
mask <- avgexp.cqn > 1
sum(mask)
[1] 11801
thca.filtered <- thca.filtered[mask, ]
dge.subset <- dge.subset[mask, ]
Since we will use the offsets computed by cqn, there is no need to normalize using the normalization tools from edgeR, such as calcNormFactors according to cqn vignette.
dge.subset$offset <- cqn.subset$glm.offset
dge.norm <- calcNormFactors(dge.subset)
assays(thca.filtered)$n.logCPM <- cpm(dge.norm, log = TRUE, prior.count = 3)
par(mfrow = c(3, 2))
# Not normalized but filtered
multidensity(as.list(as.data.frame(assays(thca.filtered[, thca.filtered$type ==
"tumor"])$logCPM)), xlab = "log 2 CPM", legend = NULL, main = "Tumor samples",
las = 1)
multidensity(as.list(as.data.frame(assays(thca.filtered[, thca.filtered$type ==
"tumor"])$logCPM)), xlab = "log 2 CPM", legend = NULL, main = "Normal samples",
las = 1)
# Normalize with calcNormFactors after filtering out those
multidensity(as.list(as.data.frame(assays(thca.filtered[, thca.filtered$type ==
"tumor"])$n.logCPM)), xlab = "log 2 CPM", legend = NULL, main = "Tumor samples",
las = 1)
multidensity(as.list(as.data.frame(assays(thca.filtered[, thca.filtered$type ==
"tumor"])$n.logCPM)), xlab = "log 2 CPM", legend = NULL, main = "Normal samples",
las = 1)
# Normalized with cqn and filtered the low values:
multidensity(as.list(as.data.frame(assays(thca.filtered[, thca.filtered$type ==
"tumor"])$cqn.logCPM)), xlab = "log 2 CPM", legend = NULL, main = "Tumor samples",
las = 1)
multidensity(as.list(as.data.frame(assays(thca.filtered[, thca.filtered$type ==
"tumor"])$cqn.logCPM)), xlab = "log 2 CPM", legend = NULL, main = "Normal samples",
las = 1)
The normalization after filtering lowly-expressed genes with ‘cqn’ package clusters samples into one thin gaussian curve whereas normalization with calcNormFactors makes it thiker.
We examine now the MA-plots of the normalized expression profiles. We look first to the tumor samples.
par(mfrow = c(8, 3), mar = c(4, 3, 4, 1))
setmp <- thca.filtered[, thca.filtered$type == "tumor"]
dgetmp <- dge.subset[, thca.filtered$type == "tumor"]
for (i in 1:ncol(setmp)) {
A <- rowMeans(assays(setmp)$cqn.logCPM)
M <- assays(setmp)$cqn.logCPM[, i] - A
samplename <- substr(as.character(setmp$bcr_patient_barcode[i]), 1, 12)
smoothScatter(A, M, main = samplename, las = 1)
abline(h = 0, col = "blue", lwd = 2)
lo <- lowess(M ~ A)
lines(lo$x, lo$y, col = "red", lwd = 2)
}
We do not observe samples with major expression-level dependent biases.
Let’s look now to the normal samples:
par(mfrow = c(8, 3), mar = c(4, 3, 4, 1))
setmp <- thca.filtered[, thca.filtered$type == "normal"]
dgetmp <- dge.subset[, thca.filtered$type == "normal"]
for (i in 1:ncol(setmp)) {
A <- rowMeans(assays(setmp)$cqn.logCPM)
M <- assays(setmp)$cqn.logCPM[, i] - A
samplename <- substr(as.character(setmp$bcr_patient_barcode[i]), 1, 12)
smoothScatter(A, M, main = samplename, las = 1)
abline(h = 0, col = "blue", lwd = 2)
lo <- lowess(M ~ A)
lines(lo$x, lo$y, col = "red", lwd = 2)
}
We do not observe either important expression-level dependent biases among the normal samples except on the sample TCGA-BJ-A3PR. Which we proceed to remove it from the cqn.logCP:
mask <- -grep("TCGA.BJ.A3PR", colnames(thca.filtered))
thca.filtered <- thca.filtered[, mask]
sample_names <- sample_names[mask]
dge.subset <- dge.subset[, mask]
We will search now for potential surrogate of batch effect indicators. Given that each sample names corresponds to a TCGA barcode (see the wiki), following the strategy described here we are going to derive different elements of the TCGA barcode and examine their distribution across samples.
tss <- substr(sample_names, 6, 7)
table(tss)
tss
BJ EL ET H2
13 33 2 4
samplevial <- substr(sample_names, 14, 16)
table(samplevial)
samplevial
01A 11A 11C
26 25 1
portion <- substr(sample_names, 18, 19)
table(portion)
portion
11 12 21 22 31
42 6 2 1 1
table(substr(sample_names, 20, 20))
R
52
plate <- substr(sample_names, 22, 25)
table(plate)
plate
A16R A180 A18C A19O A20F A21D A220 A22L A23N
2 2 5 2 10 6 6 7 12
center <- substr(sample_names, 27, 28)
table(center)
center
07
52
We can observe that the samples where collected at 4 tissues source sites but the majority of them at University of Pittsburg (BJ) and MD Anderson(EL). However,the manager center of the data is the same (University of North Carolina) and the same metabolite (RNA). In addition, they come from 7 different plates more or less distributed. However, most samples came from 2 out of 5 different portion and 2 out of 3 vials.
We are going to check if the previous variables of the barcode are surrogate of batch effect indicator. Considering our outcome of interest as molecular changes between sample types, tumor vs. normal, we will examine now the cross-classification of this outcome with the variables:
table(data.frame(TYPE = thca.filtered$type, TSS = tss))
TSS
TYPE BJ EL ET H2
normal 0 20 2 4
tumor 13 13 0 0
# We will check for the other variables
table(data.frame(TYPE = thca.filtered$type, PLATE = plate))
PLATE
TYPE A16R A180 A18C A19O A20F A21D A220 A22L A23N
normal 0 2 0 0 3 0 2 7 12
tumor 2 0 5 2 7 6 4 0 0
table(data.frame(TYPE = thca.filtered$type, Portion = portion))
Portion
TYPE 11 12 21 22 31
normal 18 4 2 1 1
tumor 24 2 0 0 0
table(data.frame(TYPE = thca.filtered$type, Vial = samplevial))
Vial
TYPE 01A 11A 11C
normal 12 13 1
tumor 14 12 0
From the tables we can already see that all of them can be candidates as indicators for batch effect as there is no homogeneity of the number of tumor and normal samples across any of the variables.
To double check if they have some effect on the batch we make the hierarchical clustering of them. First, we set some functions to plot the hierarhcical clustering:
help.dendogram <- function(x, batch, labels) {
# Helper function to plot dendograms
if (is.leaf(x)) {
## color by batch
attr(x, "nodePar") <- list(lab.col = as.vector(batch[attr(x, "label")]))
## label by outcome
attr(x, "label") <- as.vector(labels[attr(x, "label")])
}
x
}
plot.batch <- function(dendrogram, batching, se, out) {
# Function to see the batch Given a sampleDendogram use batching to colour
# the leafs by sample.names of se
sample.names = colnames(se)
batch <- as.integer(factor(batching))
names(batch) <- sample.names
i.dendrogram <- dendrapply(dendrogram, help.dendogram, batch, out)
plot(i.dendrogram, main = "Hierarchical clustering")
legend("topright", paste("Batch", levels(factor(batching))), fill = sort(unique(batch)))
}
With this functions to plot the clustering with different labelings is easier:
d <- as.dist(1 - cor(assays(thca.filtered)$cqn.logCPM, method = "spearman"))
sampleClustering <- hclust(d)
sampleDendrogram <- as.dendrogram(sampleClustering, hang = 0.1)
outcome <- paste(substr(colnames(thca.filtered), 9, 12), as.character(thca.filtered$type),
sep = "-")
names(outcome) <- colnames(thca.filtered)
par(mfrow = c(1, 1))
plot.batch(sampleDendrogram, portion, thca.filtered, outcome)
plot.batch(sampleDendrogram, samplevial, thca.filtered, outcome)
plot.batch(sampleDendrogram, tss, thca.filtered, outcome)
plot.batch(sampleDendrogram, as.character(thca.filtered$type), thca.filtered,
outcome)
plot.batch(sampleDendrogram, plate, thca.filtered, outcome)
We can see that the sample from A3PR-normal clusters with the tumoral ones, and the A3H2 tumoral clusters with the healthy patients. There might be a batch effect on the samplevial, that is the order of portion in a sequence of 100 - 120 mg sample portions. We can also observe that there are two groups of tumoral samples, and one of them clusters with one sample from a healthy person. We can see with a PCA that the tumoral and normal cells don’t create two clearly separated groups:
batch_effects <- list(portion, tss, as.character(thca.filtered$type), as.character(thca.filtered$gender))
for (batching in batch_effects) {
batch <- as.integer(factor(batching))
names(batch) <- sample_names
plotMDS(dge.subset, labels = outcome, col = batch)
legend("bottomleft", paste("Batch", levels(factor(batching))), fill = sort(unique(batch)),
inset = 0.05, cex = 0.7)
}
We can confirm that there isn’t a clear group of normal and tumors and some tumoral samples do mix with the normal ones. And it seems that the females have more variance between healthy and tumoral.
After looking for batch effect we couldn’t find a clear cause of the batch effect. We suspect that sample vial has something to do. However, we try to see if we can improve the clustering by using method ComBat for sample vial.
library("sva")
# Create the design matrix involving tumor and normal samples and males and
# females
treat <- factor(paste(thca.filtered$type, thca.filtered$gender, sep = "."))
design <- model.matrix(~0 + treat)
colnames(design) <- levels(treat)
combatexp <- ComBat(assays(thca.filtered)$cqn.logCPM, tss, mod = design[, -1])
Found 4 batches
Adjusting for 3 covariate(s) or covariate level(s)
Standardizing Data across genes
Fitting L/S model and finding priors
Finding parametric adjustments
Adjusting the Data
d <- as.dist(1 - cor(combatexp, method = "spearman"))
cluster.combat <- hclust(d)
cluster.combat <- as.dendrogram(cluster.combat, hang = 0.1)
names(batch) <- colnames(thca.filtered)
outcome <- paste(substr(colnames(thca.filtered), 9, 12), as.character(thca.filtered$type),
sep = "-")
names(outcome) <- colnames(thca.filtered)
plot.batch(cluster.combat, samplevial, thca.filtered, outcome)
plot.batch(cluster.combat, thca.filtered$gender, thca.filtered, outcome)
plot.batch(cluster.combat, thca.filtered$type, thca.filtered, outcome)
We can observe a slightly improvement, but still some tumoral samples cluster together with the normal ones.
To improve the batch effect removal we look if QR decomposition yields better results. In this case, we look for sample vial and TSS batch effects removal. To do so, we have to built the full model which would include gender and type as fixed effects.
library("limma")
qrexp <- removeBatchEffect(assays(thca.filtered)$cqn.logCPM, tss, design = design)
d <- as.dist(1 - cor(qrexp, method = "spearman"))
cluster.qr <- hclust(d)
cluster.qr <- as.dendrogram(cluster.qr, hang = 0.1)
plot.batch(cluster.qr, samplevial, thca.filtered, outcome)
plot.batch(cluster.qr, tss, thca.filtered, outcome)
plot.batch(cluster.qr, thca.filtered$gender, thca.filtered, outcome)
plot.batch(cluster.qr, thca.filtered$type, thca.filtered, outcome)
plot.batch(cluster.qr, tss, thca.filtered, outcome)
With QR decomposition we can observe that just one normal sample clusters with tumoral ones, and we can’t observe other factor influencing this clustering.
We further try if SVD returns a better result:
library("corpcor")
s <- fast.svd(t(scale(t(assays(thca.filtered)$cqn.logCPM), center = TRUE, scale = TRUE)))
pcSds <- s$d
pcSds[1] <- 0
svdexp <- s$u %*% diag(pcSds) %*% t(s$v)
colnames(svdexp) <- colnames(thca.filtered)
class(svdexp)
[1] "matrix"
dim(svdexp)
[1] 11801 52
d <- as.dist(1 - cor(svdexp, method = "spearman"))
cluster.svd <- hclust(d)
cluster.svd <- as.dendrogram(cluster.svd, hang = 0.1)
names(batch) <- colnames(thca.filtered)
plot.batch(cluster.svd, samplevial, thca.filtered, outcome)
plot.batch(cluster.svd, tss, thca.filtered, outcome)
plot.batch(cluster.svd, thca.filtered$type, thca.filtered, outcome)
SVD clusters together normal and tumoral samples, maybe because some of them are quite similar according to the PCA previously performed. After all the process the best one, that separates better tumor and normal samples is the qr decomposition, which will be used for following-up analysis.
assays(thca.filtered)$qrexp <- qrexp
We perform a simple examination of expression changes and their associated p-values using the R/Bioconductor package sva. We build the null model with just the intercept (=1) in order to compare it with the previous full model (including gender and type).
mod0 <- model.matrix(~1, data = colData(thca.filtered))
pv <- f.pvalue(assays(thca.filtered)$qrexp, design, mod0)
sum(p.adjust(pv, method = "fdr") < 0.01)
[1] 4984
There are 4984 genes changing significantly their expression at FDR < 1%. In Figure 18 below we show the distribution of the resulting p-values.
par(mfrow = c(1, 1))
hist(pv, main = "Histogram of p-values", las = 1)
hist(pv, main = "Histogram of p-values", xlim = c(0, 0.1), breaks = 80)
Now, let’s estimate surrogate variables using the sva() function.
sv <- sva(assays(thca.filtered)$qrexp, design, mod0)
Number of significant surrogate variables is: 9
Iteration (out of 5 ):1 2 3 4 5
sv$n
[1] 9
The SVA algorithm has found 9 surrogate variables. Let’s use them to assess against the extent of differential expression this time adjusting for these surrogate variables.
modsv <- cbind(design, sv$sv)
mod0sv <- cbind(mod0, sv$sv)
pvsv <- f.pvalue(assays(thca.filtered)$qrexp, modsv, mod0sv)
sum(p.adjust(pvsv, method = "fdr") < 0.01)
[1] 6504
We have increased the number of changing genes to 6504. Figure 19 shows the resulting distribution of p-values.
hist(pvsv, main = "Histogram of p-values adjusting surrogate variables", las = 1)
hist(pvsv, main = "Histogram of p-values adjusting surrogate variables", las = 1,
xlim = c(0, 0.1), breaks = 80)
After this preliminary analysis, we can conclude that aproximatelly 4300 genes have different expression levels among tumor and normal samples.
Instead of working with the surrogate variables in the model, we remove them for commodity to from latter on the contrasts.
# Code from https://www.biostars.org/p/121489/#121500
cleanY = function(y, mod, svs) {
X = cbind(mod, svs)
Hat = solve(t(X) %*% X) %*% t(X)
beta = (Hat %*% t(y))
rm(Hat)
gc()
P = ncol(mod)
return(y - t(as.matrix(X[, -c(1:P)]) %*% beta[-c(1:P), ]))
}
e.no_surrogates <- cleanY(assays(thca.filtered)$qrexp, design, sv$sv)
The object e.no_surrogates contains the expression after removing the surrogates variables.
We want ensembl annotation, and gene symbol of our probes, and we want also the gene ontologies from all the human genes (will be later used for functional enrichment):
library("org.Hs.eg.db")
annot <- select(org.Hs.eg.db, columns = c("SYMBOL", "ENSEMBL"), keys = rownames(thca.filtered),
keytype = "ENTREZID")
all.Human.GO <- select(org.Hs.eg.db, columns = c("GO", "SYMBOL"), key = keys(org.Hs.eg.db,
keytype = "ENTREZID"), keytype = "ENTREZID")
annot.Entrez.GO <- select(org.Hs.eg.db, columns = c("GO", "SYMBOL"), keys = rownames(thca.filtered),
keytype = "ENTREZID")
The different objects contain different information: annot contains the Symbol and Ensembl identifier from the remaining genes, all.Human.GO is the object with Gene Ontologies and Symbols for all human genes, while anot.Entrez.GO has the GO and the symbols for the genes.
Using the expression values with batch effect removed by qr decomposition and without the surrogate variables, we proceed with the voom normalization
# Transform the log2 normalized by cqn to counts without the surrogate
# variables
n_counts <- apply(e.no_surrogates, 1:2, function(x) {
2^(x)
})
# Normalize by mean variance with voom
vo <- voom(n_counts, design, plot = TRUE)
corfit <- duplicateCorrelation(vo, design, block = thca.filtered$bcr_patient_barcode)
fit <- lmFit(vo, design, block = thca.filtered$bcr_patient_barcode, correlation = corfit$consensus.correlation)
# Make the comparisons we are interested in based on our design
cm <- makeContrasts(FemaleDiseaseVsFemaleNormal = tumor.FEMALE - normal.FEMALE,
MaleDiseaseVsMaleNormal = tumor.MALE - normal.MALE, MaleDiseaseVsFemaleDisease = tumor.MALE -
tumor.FEMALE, MaleNormalVsFemaleNormal = normal.MALE - normal.FEMALE,
NormalVsDisease = (tumor.MALE + tumor.FEMALE) - (normal.MALE + normal.FEMALE),
FemaleVsMale = (tumor.FEMALE + normal.FEMALE) - (tumor.MALE + normal.MALE),
levels = design)
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2)
# Decide the threshold of differentially expressed genes
DE_threshold <- 4
lfc_threshold <- log2(DE_threshold)
res <- decideTests(fit2, lfc = lfc_threshold)
summary(res)
FemaleDiseaseVsFemaleNormal MaleDiseaseVsMaleNormal
-1 178 157
0 11422 11465
1 201 179
MaleDiseaseVsFemaleDisease MaleNormalVsFemaleNormal NormalVsDisease
-1 0 0 625
0 11799 11800 10497
1 2 1 679
FemaleVsMale
-1 3
0 11797
1 1
We decided a threshold of differentially expressed genes of 4 (thus log2 of it is 2), meaning that it must be 4 times more expressed in one group than in other in orther to pass the filter. Based on these we are only interested in four contrast which will be annotated with symbol and ensembl id:
tt_tumor <- topTable(fit2, number = Inf, coef = "NormalVsDisease", p.value = 0.05)
tt_gender <- topTable(fit2, number = Inf, coef = "FemaleVsMale", p.value = 0.05)
tt_females <- topTable(fit2, number = Inf, coef = "FemaleDiseaseVsFemaleNormal",
p.value = 0.05)
tt_males <- topTable(fit2, number = Inf, coef = "MaleDiseaseVsMaleNormal", p.value = 0.05)
# Annotate with the gene symbol
tt_tumor <- merge(tt_tumor, annot, by.x = 0, by.y = "ENTREZID", all.x = T, all.y = F)
tt_gender <- merge(tt_gender, annot, by.x = 0, by.y = "ENTREZID", all.x = T,
all.y = F)
head(tt_tumor)
Row.names logFC AveExpr t P.Value adj.P.Val
1 1 2.9732048 3.017988 10.525841 2.066196e-14 1.912406e-13
2 100 1.3931387 2.764762 6.042533 1.733250e-07 4.683783e-07
3 100033416 1.0649093 2.307230 7.527648 7.797554e-10 3.022961e-09
4 100033420 -1.7440631 4.653458 -6.743149 1.358171e-08 4.312019e-08
5 100033426 -0.3077701 6.273374 -3.925917 2.588923e-04 4.573635e-04
6 100033428 -2.2098472 1.972748 -9.777443 2.634606e-13 1.932318e-12
B SYMBOL ENSEMBL
1 22.586875 A1BG ENSG00000121410
2 6.831177 ADA ENSG00000196839
3 12.208326 SNORD116-4 ENSG00000275529
4 9.043603 SNORD116-8 ENSG00000207093
5 -0.793124 SNORD116-14 ENSG00000206621
6 20.109476 SNORD116-16 ENSG00000207263
Diagnostics plot:
pars <- par(mfrow = c(1, 2), mar = c(4, 5, 2, 2))
hist(tt_gender$P.Value, xlab = "Raw P-values", main = "", las = 1)
qqt(fit2$t[, 6], df = fit2$df.prior + fit2$df.residual[6], main = "", pch = ".",
cex = 3)
qqline(fit2$t[, 6], lwd = 2, col = "red")
abline(0, 1, lwd = 2)
mtext("Comparing genes between females and males patients", outer = TRUE, cex = 1.5,
side = 3, line = -2)
pars <- par(mfrow = c(1, 2), mar = c(4, 5, 2, 2))
hist(tt_tumor$P.Value, xlab = "Raw P-values", main = "", las = 1)
qqt(fit2$t[, 5], df = fit2$df.prior + fit2$df.residual[5], main = "", pch = ".",
cex = 3)
qqline(fit2$t[, 5], lwd = 2, col = "red")
abline(0, 1, lwd = 2)
mtext("Comparing genes between healthy and tumoral patients", outer = TRUE,
cex = 1.5, side = 3, line = -2)
pars <- par(mfrow = c(1, 2), mar = c(4, 5, 2, 2))
hist(tt_females$P.Value, xlab = "Raw P-values", main = "", las = 1)
qqt(fit2$t[, 1], df = fit2$df.prior + fit2$df.residual[1], main = "", pch = ".",
cex = 3)
qqline(fit2$t[, 1], lwd = 2, col = "red")
abline(0, 1, lwd = 2)
mtext("Comparing genes between females patients and healthy.", outer = TRUE,
cex = 1.5, side = 3, line = -2)
pars <- par(mfrow = c(1, 2), mar = c(4, 5, 2, 2))
hist(tt_males$P.Value, xlab = "Raw P-values", main = "", las = 1)
qqt(fit2$t[, 2], df = fit2$df.prior + fit2$df.residual[2], main = "", pch = ".",
cex = 3)
qqline(fit2$t[, 2], lwd = 2, col = "red")
abline(0, 1, lwd = 2)
mtext("Comparing genes between male patients and healthy.", outer = TRUE, cex = 1.5,
side = 3, line = -2)
We can see in volanco plots how look like the data
volcano <- function(data, title) {
# Plot a volcano plot of a toptable output of limma
ggplot(data, aes(logFC, B, colour = adj.P.Val)) + geom_point() + b + ggtitle(title) +
geom_vline(xintercept = c(2, -2), col = "red") + geom_hline(yintercept = 4.6,
col = "red")
}
volcano(tt_tumor, "Comparing healthy vs tumoral")
volcano(tt_gender, "Comparing males vs females")
volcano(tt_males, "Comparing males: healthy vs tumoral")
volcano(tt_females, "Comparing females: healthy vs tumoral")
The number of genes which are differentialed expressed can be calculated with:
de_tumor <- subset(tt_tumor, abs(logFC) >= lfc_threshold & B >= 4.6)
de_gender <- subset(tt_gender, abs(logFC) >= lfc_threshold & B >= 4.6)
de_males <- subset(tt_males, abs(logFC) >= lfc_threshold & B >= 4.6)
de_females <- subset(tt_females, abs(logFC) >= lfc_threshold & B >= 4.6)
# Number of genes which pass the filter
de_genes <- c(nrow(de_tumor), nrow(de_gender), nrow(de_males), nrow(de_females))
names(de_genes) <- c("Normal vs Disease", "Females vs Males", "Males", "Females")
de_genes
Normal vs Disease Females vs Males Males Females
1260 2 288 371
As we can see there aren’t any gene of the comparison male vs female that pass the filter. We can compare if in the different comparisons such genes remain allways up-regulated or down-regulated. But we create some function to make it easier to compare several lists:
up_down <- function(data, names = NULL) {
# Classify the genes in up or down, names are the names of the genes and
# should be on the same order
dif_genes <- ifelse(data$logFC > 0, "up", "down")
if (is.null(names)) {
names(dif_genes) <- rownames(data)
} else {
names(dif_genes) <- names
}
dif_genes
}
classify <- function(regulated1, regulated2) {
# Check how much up regulated genes in one comparison are down regulated in
# others
inter <- intersect(names(regulated1), names(regulated2))
genes <- c()
genes2 <- c()
# Reorder them by the name for the comparison with table
for (name in inter) {
genes <- c(genes, regulated1[name])
genes2 <- c(genes2, regulated2[name])
}
# Perform the comparison
table(genes, genes2, dnn = c(deparse(substitute(regulated1)), deparse(substitute(regulated2))))
}
genes_names <- function(regulated1, regulated2) {
# Extract the names of those up-regulated, down-regulated or not clear in
# both lists
inter <- intersect(names(regulated1), names(regulated2))
up <- c()
down <- c()
not_clear <- c()
for (name in inter) {
if (regulated1[name] == regulated2[name]) {
if (regulated1[name] == "up") {
up <- c(up, name)
} else {
down <- c(down, name)
}
} else {
not_clear <- c(not_clear, name)
}
}
list(up = up, down = down, not_clear = not_clear)
}
The function up_down classify the genes in up-regulated and down-regulated. The function classify given two lists of genes up or down regulates, find the genes common in them and check if in both lists are up-regulated or down-regulated, however it doesn’t display the names, just the numbers. However the function genes_names classify the genes on the three elements of the list: up if in both sets is up-regulated, down if in both sets the genes is down-regulated and not_clear if in a dataset the gene is up-regulated and in the other is down-regulated. We can see here the results of several comparisons:
r_tumor <- up_down(de_tumor)
r_females <- up_down(de_females)
r_males <- up_down(de_males)
# We collect the name of genes for further studies
genes_females_males <- genes_names(r_females, r_males)
genes_tumor_males <- genes_names(r_tumor, r_males)
genes_tumor_females <- genes_names(r_tumor, r_females)
# We see how do they overlap, more up-regulated, or more dow-regulated
# genes.
classify(r_tumor, r_males)
r_males
r_tumor down up
down 3 2
classify(r_tumor, r_females)
r_females
r_tumor down up
down 4 4
up 2 1
classify(r_females, r_males)
r_males
r_females down up
down 123 0
up 0 126
We can observe that there are 2 genes which depending on the comparison they change the direction from up-regulated to dow-regulated or vice versa. We have already obtained list of genes classified by the behavior in two comparisons with genes_names function. Furthermore we can extract all the names of the other comparisons to see how much they overlap:
DE_tumor <- unique(rownames(de_tumor))
DE_males <- unique(rownames(de_males))
DE_females <- unique(rownames(de_females))
To visualize it we plot them with venn diagrams:
library("Vennerable")
plot(Venn(list(DE_tumor, DE_females), SetNames = c("Tumoral", "Females")))
plot(Venn(list(DE_tumor, DE_males), SetNames = c("Tumoral", "Males")))
plot(Venn(list(DE_females, DE_males), SetNames = c("Females", "Males")))
As we can observe of the comparison, many genes differentially expressed on females comparing healthy samples and tumoral ones are not found on males.
We can perform a functional enrichment on our data with the GO already stored
library(GOstats)
go_obj <- function(genes_id) {
# Create an object for GOstat from ENTREZIDs
new("GOHyperGParams", geneIds = genes_id, universeGeneIds = unique(all.Human.GO$ENTREZID),
annotation = "org.Hs.eg.db", ontology = "BP", pvalueCutoff = 0.05, testDirection = "over",
conditional = TRUE)
}
params_tumor <- go_obj(DE_tumor)
Warning in makeValidParams(.Object): removing geneIds not in
universeGeneIds
params_males <- go_obj(DE_males)
params_females <- go_obj(DE_females)
# The set of differentially expressed only in females
mask <- DE_females %in% intersect(DE_males, DE_females)
DE_females_exc <- DE_females[!mask]
DE_females_exc <- DE_females_exc[!DE_females_exc %in% intersect(DE_females_exc,
DE_tumor)]
params_female_exc <- go_obj(DE_females_exc)
# We explore those genes that are shared among the comparisons
go_females_males_up <- go_obj(genes_females_males$up)
go_females_males_down <- go_obj(genes_females_males$down)
go_tumor_males_not_clear <- go_obj(genes_tumor_males$not_clear)
go_tumor_females_not_clear <- go_obj(genes_tumor_females$not_clear)
go_tumor_females_down <- go_obj(genes_tumor_females$down)
Once prepared with the object we can run the analysis:
go_analisys <- function(go_obj, annots.GO, size = 5, count = 5, pvalue = 0.05) {
hgOver <- hyperGTest(go_obj)
GO <- GOstats::summary(hgOver)
GO <- subset(GO, Size >= size & Count >= count & Pvalue <= pvalue)
if (nrow(GO) == 0) {
warning("Not significant GO on this group with these settings.")
return(NULL)
}
# Visualize the data
plotgo <- ggplot(GO, aes(Size, Count, size = OddsRatio, colour = Pvalue)) +
geom_point() + b + ggtitle(deparse(substitute(go_obj)))
GO <- GO[order(GO$OddsRatio, decreasing = TRUE), ]
resume <- function(id, annot) {
a <- unique(annot[annot$ENTREZID == id, "SYMBOL"])
if (length(a) >= 1) {
return(a)
} else {
return("")
}
}
geneIDs <- geneIdsByCategory(hgOver)[GO$GOBPID]
geneSYMs <- sapply(geneIDs, resume, annot = annots.GO)
geneSYMs <- sapply(geneSYMs, paste, collapse = ", ")
GO <- cbind(GO, Genes = geneSYMs)
rownames(GO) <- 1:nrow(GO)
list(GO = GO, plot = plotgo)
}
GO_tumor <- go_analisys(params_tumor, all.Human.GO)
if (!is.null(GO_tumor)) {
print(head(GO_tumor$GO))
plot(GO_tumor$plot)
}
GOBPID Pvalue OddsRatio ExpCount Count Size
1 GO:0060416 8.702355e-05 16.801486 0.5227697 5 11
2 GO:0051597 1.463559e-04 14.333635 0.5728010 5 12
3 GO:0031065 2.285008e-04 12.541139 0.6205344 5 13
4 GO:0071276 4.922400e-04 10.031646 0.7160012 5 15
5 GO:0050764 2.447552e-04 8.661856 0.9489974 6 20
6 GO:0003301 9.366368e-04 8.358650 0.8114680 5 17
Term
1 response to growth hormone
2 response to methylmercury
3 positive regulation of histone deacetylation
4 cellular response to cadmium ion
5 regulation of phagocytosis
6 physiological cardiac muscle hypertrophy
Genes
1 BDH1, HMGCS2, IGFBP5, SLC34A1, SRD5A1
2 ALAD, ANK2, CCNE1, FECH, HMBS
3 BCL6, CTBP1, PML, TGFB1, TP53
4 GSN, HMOX1, MT1F, MT1X, MT3
5 FGR, HCK, PLD1, PRKCG, PTEN, TGFB1
6 AGTR2, CAV3, MTOR, PIN1, COL14A1
We should do the same if we had some genes differentially expressed on the comparison between gender.
GO_female_exc <- go_analisys(params_female_exc, all.Human.GO)
if (!is.null(GO_female_exc)) {
print(head(GO_female_exc$GO))
plot(GO_female_exc$plot)
}
GOBPID Pvalue OddsRatio ExpCount Count Size
1 GO:0030814 0.0009796718 7.132812 0.7580306 5 125
2 GO:0030802 0.0010894644 6.957571 0.7762234 5 128
3 GO:1900371 0.0013361195 6.631541 0.8126088 5 134
4 GO:0007188 0.0018356558 6.150704 0.8732513 5 144
5 GO:0052652 0.0021942796 5.894037 0.9096367 5 150
6 GO:0006140 0.0087135453 4.195146 1.2613630 5 208
Term
1 regulation of cAMP metabolic process
2 regulation of cyclic nucleotide biosynthetic process
3 regulation of purine nucleotide biosynthetic process
4 adenylate cyclase-modulating G-protein coupled receptor signaling pathway
5 cyclic purine nucleotide metabolic process
6 regulation of nucleotide metabolic process
Genes
1 APLP1, FSHR, GALR1, INSL3, VIP
2 APLP1, FSHR, GALR1, INSL3, VIP
3 APLP1, FSHR, GALR1, INSL3, VIP
4 APLP1, FSHR, GALR1, INSL3, VIP
5 APLP1, FSHR, GALR1, INSL3, VIP
6 APLP1, FSHR, GALR1, INSL3, VIP
GO_females_males_up <- go_analisys(go_females_males_up, all.Human.GO)
if (!is.null(GO_females_males_up)) {
print(head(GO_females_males_up$GO))
plot(GO_females_males_up$plot)
}
GOBPID Pvalue OddsRatio ExpCount Count Size
1 GO:1903036 0.002807195 5.555970 0.9629541 5 162
2 GO:0034329 0.002612829 4.661719 1.3790453 6 232
3 GO:0032103 0.003219078 4.464198 1.4390464 6 249
4 GO:0060326 0.012681315 3.809257 1.3849895 5 233
5 GO:0050727 0.007194556 3.750230 1.7000300 6 286
6 GO:0051056 0.025800788 3.149130 1.6643651 5 280
Term
1 positive regulation of response to wounding
2 cell junction assembly
3 positive regulation of response to external stimulus
4 cell chemotaxis
5 regulation of inflammatory response
6 regulation of small GTPase mediated signal transduction
Genes
1 RPS19, IL23A, IL17RB, IL17RC, GBP5
2 CDH6, CDH9, COL16A1, CLDN12, PARD6A, CLDN19
3 RPS19, CXCL5, IL23A, IL17RB, IL17RC, GBP5
4 RPS19, CXCL5, VAV3, IL23A, IL17RC
5 RPS19, SPN, IL23A, IL17RB, IL17RC, GBP5
6 GDI1, CYTH2, VAV3, RALGAPA2, PSD2
GO_females_males_down <- go_analisys(go_females_males_down, all.Human.GO)
if (!is.null(GO_females_males_down)) {
print(head(GO_females_males_down$GO))
plot(GO_females_males_down$plot)
}
GOBPID Pvalue OddsRatio ExpCount Count Size
1 GO:1904062 0.007177327 4.410464 1.203542 5 211
2 GO:0032956 0.020295915 3.364519 1.562894 5 274
3 GO:0051049 0.027373182 3.116460 1.695983 5 340
4 GO:0006874 0.017866848 3.051033 2.076253 6 364
5 GO:0006913 0.011217530 3.049368 2.441309 7 428
6 GO:0008015 0.008389478 2.959666 2.891924 8 507
Term
1 regulation of cation transmembrane transport
2 regulation of actin cytoskeleton organization
3 regulation of transport
4 cellular calcium ion homeostasis
5 nucleocytoplasmic transport
6 blood circulation
Genes
1 FXYD2, KCNS1, PIK3C2A, JPH2, JPH3
2 CCL21, CCL24, TAC1, SORBS3, AP1AR
3 AVPR2, PICALM, INPP4B, NUP37, RAB3C
4 CCL21, STC1, TAC1, INPP4B, JPH2, JPH3
5 SFN, KPNA1, INPP4B, RANBP6, NXT1, PRICKLE1
6 FXYD2, AVPR2, CHRM1, CXADR, ELN, PIK3C2A, STC1, TAC1
GO_tumor_males_not_clear <- go_analisys(go_tumor_males_not_clear, all.Human.GO)
Warning in go_analisys(go_tumor_males_not_clear, all.Human.GO): Not
significant GO on this group with these settings.
if (!is.null(GO_tumor_males_not_clear)) {
print(head(GO_tumor_males_not_clear$GO))
plot(GO_tumor_males_not_clear$plot)
}
GO_tumor_females_not_clear <- go_analisys(go_tumor_females_not_clear, all.Human.GO)
Warning in go_analisys(go_tumor_females_not_clear, all.Human.GO): Not
significant GO on this group with these settings.
if (!is.null(GO_tumor_females_not_clear)) {
print(head(GO_tumor_females_not_clear$GO))
plot(GO_tumor_females_not_clear$plot)
}
GO_tumor_females_down <- go_analisys(go_tumor_females_down, all.Human.GO)
Warning in go_analisys(go_tumor_females_down, all.Human.GO): Not
significant GO on this group with these settings.
if (!is.null(GO_tumor_females_down)) {
print(head(GO_tumor_females_down$GO))
plot(GO_tumor_females_down$plot)
}
GO_females <- go_analisys(params_females, all.Human.GO)
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
if (!is.null(GO_females)) {
print(head(GO_females$GO))
plot(GO_females$plot)
}
GOBPID Pvalue OddsRatio ExpCount Count Size
1 GO:0098801 0.000574074 8.354320 0.6867607 5 38
2 GO:0030593 0.004921503 4.086127 1.5723206 6 87
3 GO:0001843 0.006120260 3.892881 1.6446112 6 91
4 GO:0030817 0.003460009 3.831447 1.9518463 7 108
5 GO:0097530 0.004026735 3.720238 2.0060642 7 111
6 GO:0050830 0.014170681 3.716216 1.4277394 5 79
Term
1 regulation of renal system process
2 neutrophil chemotaxis
3 neural tube closure
4 regulation of cAMP biosynthetic process
5 granulocyte migration
6 defense response to Gram-positive bacterium
Genes
1 AGTR2, AVPR2, GAS6, STC1, TAC1
2 CXADR, CCL20, CCL21, CCL24, VAV3, IL23A
3 ADM, STIL, FZD6, SETD2, VANGL2, PRICKLE1
4 ADM, APLP1, AVPR2, FSHR, GALR1, INSL3, VIP
5 CXADR, CCL20, CCL21, CCL24, VAV3, IL23A, IL17RC
6 ADM, TNFSF8, TAC1, VIP, TNFAIP8
GO_males <- go_analisys(params_males, all.Human.GO)
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
Warning in annot$ENTREZID == id: longer object length is not a multiple of
shorter object length
if (!is.null(GO_males)) {
print(head(GO_males$GO))
plot(GO_males$plot)
}
GOBPID Pvalue OddsRatio ExpCount Count Size
1 GO:0071621 0.002606787 4.661530 1.377965 6 102
2 GO:1990266 0.008934311 4.172880 1.269889 5 94
3 GO:0051495 0.009254627 4.134361 1.280773 5 96
4 GO:1903036 0.006526480 3.371560 2.188532 7 162
5 GO:0003018 0.008157799 3.224487 2.283098 7 169
6 GO:0009063 0.024224260 3.196317 1.634644 5 121
Term
1 granulocyte chemotaxis
2 neutrophil migration
3 positive regulation of cytoskeleton organization
4 positive regulation of response to wounding
5 vascular process in circulatory system
6 cellular amino acid catabolic process
Genes
1 CXADR, CCL21, CCL24, VAV3, IL23A, IL17RC
2 CXADR, CCL21, CCL24, VAV3, IL23A
3 CFL2, SPAST, TAC1, SORBS3, KISS1R
4 RPS19, CCL24, TAC1, IL23A, IL17RB, IL17RC, GBP5
5 ADM, ADORA1, ADRA1B, AVPR2, CHRM1, PIK3C2A, TACR2
6 GLDC, GOT2, DDO, ALDH4A1, PCYOX1
We should look in details to those values and associations in the tables.
sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] splines stats4 parallel stats graphics grDevices utils
[8] datasets methods base
other attached packages:
[1] GO.db_3.3.0 GOstats_2.38.0
[3] graph_1.50.0 Category_2.38.0
[5] Matrix_1.2-6 Vennerable_3.1.0.9000
[7] org.Hs.eg.db_3.3.0 corpcor_1.6.8
[9] sva_3.20.0 genefilter_1.54.2
[11] mgcv_1.8-12 nlme_3.1-128
[13] cqn_1.18.0 quantreg_5.26
[15] SparseM_1.7 preprocessCore_1.34.0
[17] nor1mix_1.2-1 mclust_5.2
[19] geneplotter_1.50.0 annotate_1.50.0
[21] XML_3.98-1.4 AnnotationDbi_1.34.3
[23] lattice_0.20-33 edgeR_3.14.0
[25] limma_3.28.6 ggplot2_2.1.0
[27] SummarizedExperiment_1.2.2 Biobase_2.32.0
[29] GenomicRanges_1.24.1 GenomeInfoDb_1.8.1
[31] IRanges_2.6.0 S4Vectors_0.10.1
[33] BiocGenerics_0.18.0 knitr_1.13
loaded via a namespace (and not attached):
[1] statmod_1.4.24 reshape2_1.4.1 colorspace_1.2-6
[4] htmltools_0.3.5 yaml_2.1.13 survival_2.39-4
[7] RBGL_1.48.1 DBI_0.4-1 RColorBrewer_1.1-2
[10] plyr_1.8.4 stringr_1.0.0 zlibbioc_1.18.0
[13] MatrixModels_0.4-1 munsell_0.4.3 gtable_0.2.0
[16] codetools_0.2-14 evaluate_0.9 labeling_0.3
[19] GSEABase_1.34.0 Rcpp_0.12.5 KernSmooth_2.23-15
[22] xtable_1.8-2 scales_0.4.0 formatR_1.4
[25] XVector_0.12.0 digest_0.6.9 stringi_1.1.1
[28] grid_3.3.0 tools_3.3.0 magrittr_1.5
[31] RSQLite_1.0.0 rmarkdown_0.9.6 AnnotationForge_1.14.2